A machine learning pipeline to improve De Bruijn graph metatranscriptomic assemblies
نویسنده
چکیده
Motivation: With the growing significance of metatranscriptomic assemblies, the need to improve their quality and maintain their controllable size has become essential. That would help in boosting all applications based on metatranscriptomic assembly. In this paper, we propose a pipeline that filters de novo assemblies while preserving or improving their quality. Original assemblies are based on De Bruijn graphs and were created by Oases. Auxiliary scripts that help reporting statistics about all kinds of metatranscriptomic assemblies are integrated with the pipeline as well. Results: Experimental results show that the pipeline helped improving the accuracy of the assemblies with up to 6+% in addition to filtering 5000+ transcripts from 6 original assemblies each made up of 21000+ transcripts. The high precision of filtered assemblies and the reasonable running time of the pipeline makes it a potential postprocessing step of different de novo assemblies. Availability: All pipeline scripts are publicly available at https://sourceforge.net/projects/metatranspipeline/files/ Contact: [email protected]
منابع مشابه
Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis
MOTIVATION Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory ch...
متن کاملA Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes ...
متن کاملHINGE: long-read assembly achieves optimal repeat resolution.
Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution...
متن کاملMetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning
The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scor...
متن کاملAlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
MOTIVATION De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them ...
متن کامل